Activation
=================

对输入的数组中每一个元素执行激活函数计算，激活函数可选，具体函数见以下说明。

- ``Relu`` - 标准Relu函数。
    
    .. math::
    
        output_i = \max(0, input_i)

- ``Relu6`` - 在标准Relu函数的基础上进行输出上限限制。
    
    .. math::
    
        output_i = \min(\max(0, input_i),6)

- ``Clip`` - 将输入裁剪到区间 [min_val, max_val]  
    
    .. math::
        
        output_i = \min(\max(input_i, \text{min_val}), \text{max_val})

- ``LRelu`` - 带泄露的线性整流单元（Leaky Rectified Linear Unit），它在输入为正时保持线性，在输入为负时也保留一个很小的斜率，以避免标准 ReLU 中的“死亡神经元”问题。 

    .. math::
    
        output_i =
        \begin{cases}
        input_i, & input_i \ge 0 \\
        \alpha \cdot input_i, & input_i < 0
        \end{cases}

- ``Sigmoid`` -  常用的平滑非线性激活函数（又称逻辑函数），可以将任意实数映射到区间 :math:`(0, 1)`，常用于二分类问题的输出层，表示概率意义的结果。  

    .. math::
        
        output_i = \frac{1}{1 + e^{-input_i}}

- ``Tanh`` - 双曲正切激活函数（Hyperbolic Tangent），其输出范围为 :math:`(-1, 1)` 。  

    .. math::

        output_i = \tanh(input_i) = \frac{e^{input_i} - e^{-input_i}}{e^{input_i} + e^{-input_i}}

- ``HSigmoid`` - 硬 Sigmoid 激活函数（Hard Sigmoid），是 ``Sigmoid`` 函数的近似形式，计算简单、效率更高。  

    .. math::
    
        output_i = \text{clip}\left(\frac{input_i + 3}{6}, 0, 1\right)
        
    其中 ``clip(a, 0, 1)`` 表示将 ``a`` 限制在区间 :math:`[0, 1]` 内。
- ``Swish`` - 自门控（Self-Gated）激活函数，由 Google 提出，结合了 ``Sigmoid`` 与线性特性，具有平滑且非单调的特点。
    
    .. math::
    
        output_i = input_i \cdot \sigma(input_i) = \frac{input_i}{1 + e^{-input_i}}
    
    其中 :math:`\sigma(x)` 为标准 ``Sigmoid`` 函数。``Swish`` 在深层网络中通常表现优于 ``ReLU``。
- ``HSwish`` - 硬 Swish 激活函数（Hard Swish），是 ``Swish`` 函数的近似形式，计算简单且在移动端模型（如 MobileNetV3）中被广泛采用。 

    .. math::
    
        output_i = input_i \cdot \text{clip}\left(\frac{input_i + 3}{6}, 0, 1\right) 

    其中 ``clip(a, 0, 1)`` 表示将 ``a`` 限制在区间 :math:`[0, 1]` 内。
- ``HardTanh`` - 硬双曲正切激活函数（Hard Tanh），是 ``Tanh`` 函数的分段线性近似形式，计算简单、梯度稳定，常用于量化或轻量网络中。 

    .. math::
        
        output_i = \text{clip}(input_i, min\_val, max\_val)
        
    其中 ``clip(x, min_val, max_val)`` 表示当 :math:`x < min\_val` 时输出 min_val，当 :math:`x > max\_val` 时输出 max_val，否则输出 :math:`x` 本身。

- ``Gelu`` - 高斯误差线性单元（Gaussian Error Linear Unit），是一种平滑的非线性激活函数，结合了 ``ReLU`` 与概率特性。 该函数支持精确计算及非近似计算模式，近似算法由 *Hendrycks & Gimpel (2016)* 提出，用以替代精确形式 :math:`output_i=x\Phi(x)` ，计算速度更快且精度损失极小。

    .. math::

        \begin{aligned}
        output_i = 
        \begin{cases}
        0.5\,input_i \Bigl[ 1 + \tanh\!\Bigl(
            \sqrt{\frac{2}{\pi}}\,(input_i + 0.044715\,input_i^3)
        \Bigr) \Bigr], & flag = true, \\[6pt]
        input_i \,\Phi(input_i)
        = \tfrac{1}{2}x \Bigl[
            1 + \mathrm{erf}\!\Bigl(\tfrac{input_i}{\sqrt{2}}\Bigr)
        \Bigr], & flag = false.
        \end{cases}
        \end{aligned}

        
    其中 :math:`\Phi(x)` 为标准正态分布的累积分布函数。
- ``Softplus`` - ``ReLU`` 的平滑近似形式，能在零点处保持可导性。
    
    .. math::
        
        output_i = 
            \begin{cases}
            input_i, & input_i \gt 88.0 \\
            \ln(1 + e^{input_i}), & \text{otherwise}
            \end{cases}

- ``Elu`` - 在输入为正时保持线性，在输入为负时呈指数衰减，可缓解 ReLU 的“死亡神经元”问题。 

    .. math::

     output_i = 
        \begin{cases}
        input_i, & input_i \ge 0 \\
        \alpha (e^{input_i} - 1), & input_i < 0
        \end{cases} 

  其中 :math:`\alpha` 为超参数，通常取 :math:`\alpha = 1.0`。
- ``Celu`` - 连续指数线性单元（Continuously Differentiable ELU），是 ``ELU`` 的改进版本，保证在零点处连续可导。 
    
    .. math::
        output_i = 
        \begin{cases}
        input_i, & input_i \ge 0 \\
        \alpha (e^{\frac{input_i}{alpha}} - 1), & input_i < 0
        \end{cases} 
    
    其中 :math:`\alpha` 为可调超参数，用于控制负区间的平滑程度。
- ``HardShrink`` - 硬收缩激活函数（Hard Shrinkage），用于稀疏化输出。
    
    .. math::

     output_i =
     \begin{cases}
       input_i, & \text{if } |input_i| > \lambda \\
       0, & \text{otherwise}
     \end{cases}

  其中 :math:`\lambda` 为阈值常数。

- ``SoftShrink`` - 软收缩激活函数（Soft Shrinkage），与 ``HardShrink`` 类似，但收缩过程更加平滑。 
 
    .. math::

        output_i = 
        \begin{cases}
        input_i - \lambda, & \text{if } input_i > \lambda \\
        input_i + \lambda, & \text{if } input_i < -\lambda \\
        0, & \text{otherwise}
        \end{cases} 

- ``SoftsignOpt`` - 优化的软符号函数（Optimized Softsign），是一种平滑的压缩函数，用于将输入映射到有限区间。 
    
    .. math::
    
        output_i = \frac{input_i}{1 + |input_i|}

输入：
    - **Input0** - 输入数据地址。
    - **length** - 数组长度。
    - **args(部分激活函数)** - 激活函数计算参数（仅适用于部分函数）。
    - **core_mask(int, 可选)** - 核掩码（仅适用于共享存储版本）。

输出：
    - **output** - 计算结果地址。

支持平台：
    ``FT78NE``
    ``MT7004``

.. note::
    - FT78NE 支持int8, fp32
    - MT7004 支持fp16, fp32

**共享存储版本：**

.. c:function:: void i8_relu_s(int8_t* Input0, int8_t* output,int length, int core_mask)
.. c:function:: void fp_relu_s(float* Input0, float* output,int length, int core_mask)
.. c:function:: void hp_relu_s(half* Input0, half* output,int length, int core_mask)
.. c:function:: void i8_relu6_s(int8_t* Input0, int8_t* output,int length, int core_mask)
.. c:function:: void fp_relu6_s(float* Input0, float* output,int length, int core_mask)
.. c:function:: void hp_relu6_s(half* Input0, half* output,int length, int core_mask)
.. c:function:: void i8_clip_s(int8_t* Input0, int8_t* output,int length, int8_t min_val, int8_t max_val, int core_mask)
.. c:function:: void fp_clip_s(float* Input0, float* output,int length, float min_val, float max_val, int core_mask)
.. c:function:: void hp_clip_s(half* Input0, half* output,int length, half min_val, half max_val, int core_mask)
.. c:function:: void i8_lrelu_s(int8_t* Input0, int8_t* output,int length, float alpha, int core_mask)
.. c:function:: void fp_lrelu_s(float* Input0, float* output,int length, float alpha, int core_mask)
.. c:function:: void hp_lrelu_s(half* Input0, half* output,int length, half alpha, int core_mask)
.. c:function:: void i8_sigmoid_s(int8_t* Input0, float* output,int length, int core_mask)
.. c:function:: void fp_sigmoid_s(float* Input0, float* output,int length, int core_mask)
.. c:function:: void hp_sigmoid_s(half* Input0, half* output,int length, int core_mask)
.. c:function:: void i8_tanh_s(int8_t* Input0, float* output,int length, int core_mask)
.. c:function:: void fp_tanh_s(float* Input0, float* output,int length, int core_mask)
.. c:function:: void hp_tanh_s(half* Input0, half* output,int length, int core_mask)
.. c:function:: void i8_hsigmoid_s(int8_t* Input0, float* output,int length, int core_mask)
.. c:function:: void fp_hsigmoid_s(float* Input0, float* output,int length, int core_mask)
.. c:function:: void hp_hsigmoid_s(half* Input0, half* output,int length, int core_mask)
.. c:function:: void i8_swish_s(int8_t* Input0, float* output,int length, int core_mask)
.. c:function:: void fp_swish_s(float* Input0, float* output,int length, int core_mask)
.. c:function:: void hp_swish_s(half* Input0, half* output,int length, int core_mask)
.. c:function:: void i8_hswish_s(int8_t* Input0, float* output,int length, int core_mask)
.. c:function:: void fp_hswish_s(float* Input0, float* output,int length, int core_mask)
.. c:function:: void hp_hswish_s(half* Input0, half* output,int length, int core_mask)
.. c:function:: void i8_hardtanh_s(int8_t* Input0, int8_t* output,int length, int8_t min_val, int8_t max_val, int core_mask)
.. c:function:: void fp_hardtanh_s(float* Input0, float* output,int length, float min_val, float max_val, int core_mask)
.. c:function:: void hp_hardtanh_s(half* Input0, half* output,int length, half min_val, half max_val, int core_mask)
.. c:function:: void i8_gelu_s(int8_t* Input0, float* output,int length, int approximate, int core_mask)
.. c:function:: void fp_gelu_s(float* Input0, float* output,int length, int approximate, int core_mask)
.. c:function:: void hp_gelu_s(half* Input0, half* output,int length, int approximate, int core_mask)
.. c:function:: void i8_softplus_s(int8_t* Input0, float* output,int length, int core_mask)
.. c:function:: void fp_softplus_s(float* Input0, float* output,int length, int core_mask)
.. c:function:: void hp_softplus_s(half* Input0, half* output,int length, int core_mask)
.. c:function:: void i8_elu_s(int8_t* Input0, float* output,int length, float alpha, int core_mask)
.. c:function:: void fp_elu_s(float* Input0, float* output,int length, float alpha, int core_mask)
.. c:function:: void hp_elu_s(half* Input0, half* output,int length, half alpha, int core_mask)
.. c:function:: void i8_celu_s(int8_t* Input0, float* output,int length, float alpha, int core_mask)
.. c:function:: void fp_celu_s(float* Input0, float* output,int length, float alpha, int core_mask)
.. c:function:: void hp_celu_s(half* Input0, half* output,int length, half alpha, int core_mask)
.. c:function:: void i8_hardshrink_s(int8_t* Input0, int8_t* output,int length, int8_t lambd, int core_mask)
.. c:function:: void fp_hardshrink_s(float* Input0, float* output,int length, float lambd, int core_mask)
.. c:function:: void hp_hardshrink_s(half* Input0, half* output,int length, half lambd, int core_mask)
.. c:function:: void i8_softshrink_s(int8_t* Input0, int8_t* output,int length, int8_t lambd, int core_mask)
.. c:function:: void fp_softshrink_s(float* Input0, float* output,int length, float lambd, int core_mask)
.. c:function:: void hp_softshrink_s(half* Input0, half* output,int length, half lambd, int core_mask)
.. c:function:: void i8_softsignopt_s(int8_t* Input0, float* output,int length, int core_mask)
.. c:function:: void fp_softsignopt_s(float* Input0, float* output,int length, int core_mask)
.. c:function:: void hp_softsignopt_s(half* Input0, half* output,int length, int core_mask)

**C调用示例：**

    .. code-block:: c
        :linenos:
        :emphasize-lines: 10

        //FT78NE示例
        #include <stdio.h>
        #include <activation.h>

        int main(int argc, char* argv[]) {
            float *input0 = (float *)0xA0000000;   //input在DDR空间
            float *output = (float *)0xC0000000;
            int length = 1000;
            int core_mask = 0xff;
            fp_tanh_s(input0, output, length, core_mask);
            return 0;
        }

**私有存储版本：**

.. c:function:: void i8_relu_p(int8_t* Input0, int8_t* output,int length)
.. c:function:: void fp_relu_p(float* Input0, float* output,int length)
.. c:function:: void hp_relu_p(half* Input0, half* output,int length)
.. c:function:: void i8_relu6_p(int8_t* Input0, int8_t* output,int length)
.. c:function:: void fp_relu6_p(float* Input0, float* output,int length)
.. c:function:: void hp_relu6_p(half* Input0, half* output,int length)
.. c:function:: void i8_clip_p(int8_t* Input0, int8_t* output,int length, int8_t min_val, int8_t max_val)
.. c:function:: void fp_clip_p(float* Input0, float* output,int length, float min_val, float max_val)
.. c:function:: void hp_clip_p(half* Input0, half* output,int length, half min_val, half max_val)
.. c:function:: void i8_lrelu_p(int8_t* Input0, int8_t* output,int length, float alpha)
.. c:function:: void fp_lrelu_p(float* Input0, float* output,int length, float alpha)
.. c:function:: void hp_lrelu_p(half* Input0, half* output,int length, half alpha)
.. c:function:: void i8_sigmoid_p(int8_t* Input0, float* output,int length)
.. c:function:: void fp_sigmoid_p(float* Input0, float* output,int length)
.. c:function:: void hp_sigmoid_p(half* Input0, half* output,int length)
.. c:function:: void i8_tanh_p(int8_t* Input0, float* output,int length)
.. c:function:: void fp_tanh_p(float* Input0, float* output,int length)
.. c:function:: void hp_tanh_p(half* Input0, half* output,int length)
.. c:function:: void i8_hsigmoid_p(int8_t* Input0, float* output,int length)
.. c:function:: void fp_hsigmoid_p(float* Input0, float* output,int length)
.. c:function:: void hp_hsigmoid_p(half* Input0, half* output,int length)
.. c:function:: void i8_swish_p(int8_t* Input0, float* output,int length)
.. c:function:: void fp_swish_p(float* Input0, float* output,int length)
.. c:function:: void hp_swish_p(half* Input0, half* output,int length)
.. c:function:: void i8_hswish_p(int8_t* Input0, float* output,int length)
.. c:function:: void fp_hswish_p(float* Input0, float* output,int length)
.. c:function:: void hp_hswish_p(half* Input0, half* output,int length)
.. c:function:: void i8_hardtanh_p(int8_t* Input0, int8_t* output,int length, int8_t min_val, int8_t max_val)
.. c:function:: void fp_hardtanh_p(float* Input0, float* output,int length, float min_val, float max_val)
.. c:function:: void hp_hardtanh_p(half* Input0, half* output,int length, half min_val, half max_val)
.. c:function:: void i8_gelu_p(int8_t* Input0, float* output,int length, int approximate)
.. c:function:: void fp_gelu_p(float* Input0, float* output,int length, int approximate)
.. c:function:: void hp_gelu_p(half* Input0, half* output,int length, int approximate)
.. c:function:: void i8_softplus_p(int8_t* Input0, float* output,int length)
.. c:function:: void fp_softplus_p(float* Input0, float* output,int length)
.. c:function:: void hp_softplus_p(half* Input0, half* output,int length)
.. c:function:: void i8_elu_p(int8_t* Input0, float* output,int length, float alpha)
.. c:function:: void fp_elu_p(float* Input0, float* output,int length, float alpha)
.. c:function:: void hp_elu_p(half* Input0, half* output,int length, half alpha)
.. c:function:: void i8_celu_p(int8_t* Input0, float* output,int length, float alpha)
.. c:function:: void fp_celu_p(float* Input0, float* output,int length, float alpha)
.. c:function:: void hp_celu_p(half* Input0, half* output,int length, half alpha)
.. c:function:: void i8_hardshrink_p(int8_t* Input0, int8_t* output,int length, int8_t lambd)
.. c:function:: void fp_hardshrink_p(float* Input0, float* output,int length, float lambd)
.. c:function:: void hp_hardshrink_p(half* Input0, half* output,int length, half lambd)
.. c:function:: void i8_softshrink_p(int8_t* Input0, int8_t* output,int length, int8_t lambd)
.. c:function:: void fp_softshrink_p(float* Input0, float* output,int length, float lambd)
.. c:function:: void hp_softshrink_p(half* Input0, half* output,int length, half lambd)
.. c:function:: void i8_softsignopt_p(int8_t* Input0, float* output,int length)
.. c:function:: void fp_softsignopt_p(float* Input0, float* output,int length)
.. c:function:: void hp_softsignopt_p(half* Input0, half* output,int length)

**C调用示例：**

    .. code-block:: c
        :linenos:
        :emphasize-lines: 9

        //FT78NE示例
        #include <stdio.h>
        #include <activation.h>

        int main(int argc, char* argv[]) {
            float *input0 = (float *)0x10000000;   //input在DDR空间
            float *output = (float *)0x10004000;
            int length = 1000;
            fp_tanh_p(input0, output, length);
            return 0;
        }